Relay WebSocket library refactor #1930

lambchr · 2023-10-18T04:28:54Z

@PlasmaPower requested that the wsbroadcastserver library be refactored.

Currently the ClientManager will use a single goroutine to register each ClientConnection. Part of that process is sending the entire backlog, which could be very large. This means that whilst the ClientManager is blocked registering a ClientConnection and sending the entire backlog, other clients cannot connect.

The goal of the refactor is to move the logic that sends the backlog from the ClientManager goroutine to each ClientConnection goroutine. We must ensure that no messages are skipped when switching from the backlog to the out channel in each ClientConnection. We can send messages twice and the client should be able to handle it.

PlasmaPower

The overall architecture looks good, but I have some implementation recommendations. I didn't fully go through this PR yet as a heads up, so there might be other stuff I'll recommend in later reviews.

wsbroadcastserver/clientconnection.go

PlasmaPower · 2023-10-18T04:39:08Z

wsbroadcastserver/clientconnection.go

+		if cc.compression {
+			data = compressed.Bytes()
+		} else {
+			data = notCompressed.Bytes()


I would say we shouldn't serialize both versions if we don't need to, but we expect compression to the common case, and the notCompressed serialization is presumably required to generate the compressed output, so this might be fine as-is. Still, if it isn't too difficult, it might be worth only serializing the one we need.

I agree that the serializeMessage function is a little confusing. It was already there so I haven't refactored it. It doesn't actually serialize both versions, it will only do the one that we have chosen and the other returned value will be an empty slice of bytes. I think that leads to confusion and this should be refactored but I have chosen to ignore it for now to not increase the scope of this change.

I'd be happy to refactor the function in this or a future PR though if you'd like? FYI from what I can see I don't think the compressed output requires anything from the notCompressed serialization to function, so it should be easy to split into two functions.

broadcaster/backlog/backlog.go

PlasmaPower · 2023-10-18T04:43:55Z

broadcaster/backlog/backlog.go

+			b.messageCount.Store(0)
+			log.Warn(err.Error())
+		} else if errors.Is(err, errSequenceNumberSeen) {
+			log.Info("ignoring message sequence number (%s), already in backlog", msg.SequenceNumber)


We should check that this message is the same as what we have in the buffer. If not, we need to clear the buffer (or preferably only clear this sequence number and anything after it), log an error saying that a reorg occurred, and then add this message.

Agreed, the backlogSegment.append method has an if statement with the following checks:

Is this message the next number in the sequence? +1 from the previous number

Else is this message larger than the expected next number in the sequence? If it is return errDropSegments error

Else this message must have already been seen and we return the errSequenceNumberSeen error

Let me know if you think this approach is missing something :)

There's an edge case here which is reorgs. When a reorg occurs, a message can be replaced by a different message with the same sequence number. However, I'm beginning to think that what we should do in reorg cases is too ambiguous and error prone, and we should just rely on the L1 sequencer inbox to sort it out and establish a canonical message ordering, and then pick up the feed from there. I.e. the current approach should be fine. I'll probably discuss this with the nitro team and make sure this approach makes sense to everyone.

Btw, this log.Info line needs to use the key-value logging scheme instead of %s.

broadcaster/backlog/backlog.go

…ges from the backlog

… to normal pointer to backlogSegment

…pass down the required channel to communicate with each CC

…ments

…nd recreate the Contains method

…client rather than the whole segment

…seq num is lower than what is in the backlog

…ssageCount, use the backlogSegment.messagesLock to calculate these when required

…er to containers.SyncMap

…gSegment

…gment.Messages

backlogSegment.Contains functions

…id slice being populated with nils

…nt.Next now returns a nil interface

…ages

…nnection

PlasmaPower

I just have a couple last comments, mainly we need to be careful to avoid recursively calling RLock due to the way Go implements mutexes

PlasmaPower · 2023-12-09T20:12:55Z

broadcaster/backlog/backlog.go

+	segment, err := b.Lookup(start)
+	if start < head.Start() {
+		// doing this check after the Lookup call ensures there is no race
+		// condition with a delete call


I'm not sure this quite works because head was already loaded before in b.head.Start(). Maybe you could reload head after the Lookup call?

Good shout, changed

PlasmaPower · 2023-12-09T20:15:55Z

broadcaster/backlog/backlog.go

 	noMsgs := []*m.BroadcastFeedMessage{}
-	if start < s.start.Load() {
+	if start < s.Start() {


We need to be careful about recursive read locks. They cause deadlocks:

If a goroutine holds a RWMutex for reading and another goroutine might call Lock, no goroutine should expect to be able to acquire a read lock until the initial read lock is released. In particular, this prohibits recursive read locking. This is to ensure that the lock eventually becomes available; a blocked Lock call excludes new readers from acquiring the lock.

https://pkg.go.dev/sync#RWMutex.RLock

I'd suggest having a lowercase start() function which doesn't lock the mutex and is used by both Start() and this function.

Good catch! I have changed it over

PlasmaPower · 2023-12-09T20:17:22Z

broadcaster/backlog/backlog.go

-	return s.messages[startIndex:endIndex], nil
+	tmp := make([]*m.BroadcastFeedMessage, len(s.messages))
+	copy(tmp, s.messages)
+	return tmp[startIndex:endIndex], nil


I doubt this would matter too much but ideally we'd create tmp to only be the size of endIndex - startIndex and copy that section

I actually tried to do that with:

tmp := make([]*m.BroadcastFeedMessage, endIndex-startIndex) copy(tmp, s.messages[startIndex:endIndex]) return tmp, nil

but for some reason it seemed to copy messages from the s.messages slice outside of the startIndex to endIndex range that I had specified. I'm not sure why and maybe I did something wrong. For now I will leave this as is as it fixed the errors I saw in the test.

actually I must have tried something else when copying just that section of the slice as I added it and it is passing the tests. I think earlier I might have used a different length than endIndex-startIndex when making the tmp slice

PlasmaPower · 2023-12-09T20:18:04Z

broadcaster/backlog/backlog.go

+	s.messagesLock.RLock()
+	defer s.messagesLock.RUnlock()
+	start := s.Start()
+	if i < start || i > s.End() {


Same thing in this function about recursive locking

Good catch! I have changed it over

I think there's some calls to End that have the same issue

PlasmaPower

LGTM

lambchr requested a review from PlasmaPower October 18, 2023 04:28

cla-bot bot added the s Automatically added by the CLA bot if the creator of a PR is registered as having signed the CLA. label Oct 18, 2023

lambchr mentioned this pull request Oct 18, 2023

WIP: Create HTTP server for relay #1852

Closed

PlasmaPower requested changes Oct 18, 2023

View reviewed changes

PlasmaPower marked this pull request as draft October 18, 2023 04:47

lambchr added 18 commits November 14, 2023 17:11

Move Broadcast message objects to separate library

43b12ef

Create backlog library

3f119a4

create a race condition test for backlog library

342e3c9

fix backlog race conditions

b55d38e

add delete method to race condition test

fc4db62

swap catchupbuffer for backlog in wsbroadcastserver

d805660

fix go vet errors

54ea210

use Lookup() and requestedSeqNum instead of Head() when sending messa…

946046a

…ges from the backlog

change warn to error

ff440a4

change lookupByIndex map values from atomic pointer to backlogSegment…

5d36e6b

… to normal pointer to backlogSegment

remove max catchup TODO - not necessary

bff53fc

fix broadcaster tests after Head -> Lookup change in clientconnection

1faa39e

fix broadcast client invalid signature test

06d72a8

fix backlog sending duplicate messages to clients

a0bf64b

fix BroadcastClient test TestBroadcastClientConfirmedMessage

1858770

move flateWriter to serializeMessage to avoid race condition

66357cf

refactor ClientConnection to stop relying on ClientManager - CM will …

b046b8f

…pass down the required channel to communicate with each CC

refactor backlog to allow deletion of all confirmed messages from seg…

7cf1b2b

…ments

lambchr force-pushed the cl/relay-refactor branch from ec8214c to 7cf1b2b Compare November 14, 2023 04:13

lambchr added 6 commits November 14, 2023 17:15

fix messages in broadcastclients

a265e27

fix lint errors

617c0d6

remove unnecessary exported functions from BacklogSegment interface a…

8e72df6

…nd recreate the Contains method

ensure the ClientConnection only sends the requested messages to the …

ea46fd9

…client rather than the whole segment

fix backlog lint error

60ce382

ensure WSBroadcast server sends the whole backlog when the requested …

5e2f1aa

…seq num is lower than what is in the backlog

lambchr added 17 commits November 30, 2023 15:22

remove backlogSegment.start, backlogSegment.end and backlogSegment.me…

1fc038c

…ssageCount, use the backlogSegment.messagesLock to calculate these when required

fix go vet errors from containers.SyncMap addition, change to a point…

11e26ed

…er to containers.SyncMap

return nil interface from backlogSegment.Next rather than nil *backlo…

c098840

…gSegment

return copies of messages slice from backlogSegment.Get and backlogSe…

ccb6499

…gment.Messages

get messagesLock for entire backlogSegment.Get and

2537247

backlogSegment.Contains functions

remove metric TODO

369b5ea

use length of messages slice in copy rather than segment limit to avo…

c86fcfd

…id slice being populated with nils

change IsBacklogSegmentNil to check for nil interface as backlogSegme…

b706481

…nt.Next now returns a nil interface

add version as a constant to messages lib

7334858

Merge branch 'master' into cl/relay-refactor

20e5d69

correct broadcaster lib

39514fc

only send one message with the confirmed sequence number

6d1eca1

add comment explaining additional doBroadcast call for confirmed mess…

ebed210

…ages

move last sent sequence number check from client manager to client co…

fb17717

…nnection

fix sending only 1 message on confirm

93144bf

change warn to debug

4238891

fix moving last sent check

0bd866a

PlasmaPower requested changes Dec 9, 2023

View reviewed changes

lambchr added 5 commits December 11, 2023 08:34

remove recursive lock calls from backlogSegment object

fed7de2

reload head after lookup call

5a7ac9e

fix no new var error

3b4c9b6

change copy slice to only copy the selected elements

da08db5

remove recursive lock calls from backlogSegment object

69e8dde

PlasmaPower approved these changes Dec 13, 2023

View reviewed changes

PlasmaPower added 2 commits December 13, 2023 10:24

Merge branch 'master' into cl/relay-refactor

756f616

Merge branch 'master' into cl/relay-refactor

4b66827

PlasmaPower marked this pull request as ready for review December 13, 2023 23:02

PlasmaPower merged commit fd8d0b4 into master Dec 13, 2023
8 checks passed

PlasmaPower deleted the cl/relay-refactor branch December 13, 2023 23:02

yucem44 mentioned this pull request Dec 16, 2023

Precompiles‎ #2028

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Relay WebSocket library refactor #1930

Relay WebSocket library refactor #1930

lambchr commented Oct 18, 2023 •

edited

Loading

PlasmaPower left a comment

PlasmaPower Oct 18, 2023

lambchr Nov 1, 2023

PlasmaPower Oct 18, 2023

lambchr Nov 1, 2023

PlasmaPower Nov 28, 2023

PlasmaPower left a comment

PlasmaPower Dec 9, 2023

lambchr Dec 11, 2023

PlasmaPower Dec 9, 2023

lambchr Dec 11, 2023

PlasmaPower Dec 9, 2023

lambchr Dec 11, 2023

lambchr Dec 11, 2023

PlasmaPower Dec 9, 2023

lambchr Dec 11, 2023

PlasmaPower Dec 12, 2023

PlasmaPower left a comment

Relay WebSocket library refactor #1930

Relay WebSocket library refactor #1930

Conversation

lambchr commented Oct 18, 2023 • edited Loading

PlasmaPower left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PlasmaPower left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PlasmaPower left a comment

Choose a reason for hiding this comment

lambchr commented Oct 18, 2023 •

edited

Loading